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Introduction 


Human  listeners  possess  a  remarkable  ability  to  make  sense  out  of 
multiple  and  diverse  arrays  of  acoustic  information.  At  any  particular 
instant  the  complex  acoustic  waveform  arriving  at  the  ear  will  likely  contain 
information  about  a  variety  of  individual  sound  sources  as  well  as  background 
noise.  Despite  this,  the  experienced  listener  perceives  sounds  from  each  of 
the  separate  sources  individually  (e.g.,  the  dog  barking,  a  Bach  cantata,  the 
telephone  ringing)  rather  than  as  a  nonsensical  hodgepodge. 

This  impressive  human  pattern-recognition  ability  has  been  exploited  in  a 
number  of  Navy  applications  which  involve  acoustic  signal  processing.  In  both 
passive-sonar  and  ship-silencing  operations  complex  acoustic  data  are  analyzed 
for  their  tactical  significance.  In  passive  sonar,  sounds  from  the  underwater 
environment  are  recorded  on  hydrophones,  processed,  and  presented  on  both 
visual  and  acoustic  displays.  Both  long-duration  steady  state  and 
brief-duration  transient  signals  arise  from  a  variety  of  sources  as 
illustrated  in  Figure  1.  The  task  of  the  sonar  operator  in  this  context  is  to 
distinguish  the  sources  of  radiated  noise  from  each  other  as  well  as  from  the 
background  of  ambient  and  platform-generated  noise.  Ultimately,  the  identity 
(friend  or  foe?)  and  intention  (threat?)  of  these  noise  sources  must  be 
determined.  The  ability  of  experienced  sonar  technicians  to  make  these 
decisions  is  legendary.  In  ship  silencing  the  objective  is  to  reduce  the 
radiated  noise  produced  by  a  ship  not  simply  in  intensity  but  in 
detectability/classifiability.  Unwanted  sources  of  noise  must  be  identified 
and  eliminated  to  reduce  the  vulnerability  of  the  vessel  to  early  detection  by 
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enemy  sonar.  The  sonar  and  the  ship-silencing  problems  represent  opposite 
sides  of  the  same  perceptual  coin.  On  the  passive-sonar  side  one  searches  for 
eues  to  facilitate  detection  and  classification  whereas  on  the  ship-silencing 
side  such  cues  are  eliminated  to  make  detection  and  classification  difficult. 

Despite  the  significance  of  the  auditory  recognition  problem, 
surprisingly  little  research  has  investigated  how  human  listeners  perceive 
complex  nonspeech  sounds.  The  present  report  summarizes  a  series  of 
experiments  which  examined  these  processes. 

Background  and  Theory 

Historically,  most  psychoacoustic  research  has  focused  on  the  ability  of 
listeners  to  detect  isolated  pure  tones  of  relatively  brief  duration. 
Although  this  research  has  led  to  significant  advances  in  our  understanding  of 
energy  detection  mechanisms  in  the  peripheral  auditory  system,  it  has 
contributed  little  to  our  understanding  of  how  human  listeners  process  complex 
environmental  sounds.  The  primary  difficulty  of  this  approach  for  complex 
sounds  is  that  it  necessarily  focuses  on  specific  signal  parameters  and 
disregards  any  external  knowledge  which  the  listener  may  bring  to  the  task. 
In  the  contextually-rich  acoustic  world  of  meaningful  environmental  sounds, 
listeners  do  not  ignore  their  experience,  but  rely  on  what  they  know  to 
identify  sound  patterns.  For  example,  an  experienced  sonar  technician  knows  a 
great  deal  about  the  sounds  likely  to  occur,  and  this  knowledge  plays  a  major 
role  in  determining  what  is  heard.  Historically,  too  little  attention  has 
been  paid  to  these  important  "knowledge  sources"  and  their  role  in  auditory 
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perception . 

In  contrast  to  the  traditional  approach,  our  research  program  was  based 
on  the  assumption  that  both  bottom-up  or  data-driven  and  top-down  or 
knowledge-driven  processes  are  important  in  the  perception  of  complex 
nonspeech  sounds.  Bottoa-up  processing  is  based  exclusively  on  the 
information  available  in  the  acoustic  waveform.  Specific  cues  or  features  are 
present  which  suggest  a  particular  category  or  source  event.  In  contrast, 
top-down  processing  occurs  when  the  perceiver  uses  prior  knowledge  to  generate 
expectancies  of  what  is  likely  to  occur.  This  information  is  not  available  in 
the  signal  itself,  but  is  applied  by  the  perceiver  in  interpretation.  Figure 
2  illustrates  bottom-up  and  top-down  processing  as  it  was  thought  to  occur  for 
underwater  sounds.  Several  levels  of  interpretation  were  proposed  which 
ranged  from  low-level  spectral  parameters  to  high-level  multi-ship 
configurations.  Bottom-up  processing  proceeds  up  through  the  interpretation 
hierarchy  (i.e.,  certain  parameters  imply  particular  harmonics  which  imply 
source  events,  etc.),  whereas  top-down  processing  proceeds  from  higher  to 
lower  levels  (i.e.,  the  expectation  that  a  particular  ship  was  present 
suggests  that  particular  sources  and  harmonics  exist) . 


The  most  convincing  evidence  that  human  auditory  perception  involves  both 
bottom-up  and  top-down  processing  was  found  in  the  speech  perception 
literature.  Logically,  the  "raw  data"  or  specific  sounds  in  continuous  speech 
are  not  sufficient  to  account  for  language  understanding  since  the  "raw"  input 
is  neither  complete  nor  unambiguous.  Rather,  the  listener  relies  on  prior 
knowledge  of  the  proper  orderings  (syntax)  and  the  meaning  (semantics)  of  the 
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words  in  the  utterance.  Those  elements  that  can  be  interpreted  unambiguously 
provide  constraints  to  facilitate  the  interpretation  of  other  elements.  It  is 
common  for  people  to  "hear”  missing  words  or  to  "correct"  mispronounced  words 
when  they  occur  in  continuous  speech.  Many  recent  studies  have  demonstrated 
that  listeners  use  a  number  of  knowledge  sources  in  speech  perception  ranging 
from  fairly  specific  item-to-item  syntactic  constraints  to  global  semantic 
considerations  such  as  the  theme  or  title  of  a  story.  Two  speech  scientists, 
Ronald  Cole  and  Jola  Jakimik,  have  recently  concluded  that  "it  is  not  only 
what  we  hear  that  tells  us  what  we  know;  what  we  know  tells  us  what  we  hear" 
(Cole  It  Jakimik,  1978).  The  results  of  a  number  of  our  recent  experiments 
have  led  us  to  conclude  that  much  the  same  thing  occurs  in  the  perception  of 
certain  complex  nonspeech  sounds. 

In  the  present  project  we  focused  on  the  perception  of  a  large  class  of 
nonsteady-state  sounds  produced  by  relatively  brief-duration  mechanical  events 
such  as  hatch  closings  or  pump  operations.  Events  of  this  type  generate 
acoustic  transients.  The  human  ability  to  perceive  these  sounds  is  of 
particular  interest  since  current  automatic  signal-processing  devices  have 
limited  utility  for  classffication  of  transients,  and  brief-duration  signals 
are  not  easily  depicted  on  visual  displays. 

Our  research  demonstrated  that  top-down  processes  are  especially 
significant  for  the  perception  of  most  acoustic  transients.  Such  signals  are 
often  too  brief  for  listeners  to  make  an  extensive  analysis  of  their 
perceptual  features,  and  acoustic  transients  frequently  occur  in  temporal 
successions  that  form  more  complex  transient  patterns.  Since  these  patterns 


mirror  the  physical  events  which  produced  them,  the  order  of  transient 
components  within  a  pattern  is  not  arbitrary,  but  reflects  the  temporal 
structure  of  the  generating  events.  Although  the  temporal  structure  in 
patterns  of  this  sort  is  clearly  less  rigid  and  well  specified  than  the  syntax 
of  language,  a  consistent  temporal  structure  exists.  Our  research  indicated 
that  listeners  make  use  of  prior  knowledge  of  this  structure  when  processing 
patterns  of  complex  environmental  transients. 

Transient  Pattern  Classification 

He  conducted  a  number  of  experiments  to  investigate  the  role  of  both 
knowledge  of  temporal  structure  (i.e.,  syntactic  knowledge)  and  knowledge  of 
the  source  events  (i.e.,  semantic  knowledge)  in  two-alternative 
( target /nontarget)  classification  of  transient  patterns  (Pubs.  1,  2,  4,  9, 
12).  Our  findings  have  shown  that  both  knowledge  sources  can  play  an 
important  role  in  the  classification  of  such  patterns. 

One  experiment  was  designed  to  demonstrate  that  listeners  use  syntactic 
information  to  facilitate  the  classification  of  nonspeech  transient  patterns. 
Listeners  were  required  to  classify  sequences  of  brief-duration  pure  tones  as 
either  "target"  or  "non-target"  patterns.  For  one  group  the  target  patterns 
were  determined  by  a  simple  finite-state  rule  system  or  grammar.  In  contrast, 
another  group  received  target  patterns  which  were  randomly  determined  but 
matched  to  the  structured  targets  in  length.  Consequently,  the  target  set  for 
the  first,  structured  group,  had  a  coherence  of  temporal  order  which  was 
lacking  in  the  target  set  for  the  second,  unstructured  group.  The  results 


showed  that  listeners  used  syntactic  pattern  structure  to  their  advantage  in 
classifying  simple,  tonal  transient  patterns.  Listeners  who  categorized  the 
syntactically-structured  target  set  performed  substantially  better  than  those 
who  categorized  the  unstructured  set. 

Two  other  experiments  were  similar  to  this,  but  listeners  classified 
patterns  of  meaningful,  brief-duration,  complex  environmental  sounds  such  as 
pipe  clangs,  steam  hiss,  and  other  steam-  and  water-related  noise  bursts 
rather  than  tones.  Semantic  knowledge  about  those  events  was  provided  to  some 
listeners  in  the  form  of  explicit  thematic  descriptions  of  the  pattern 
components.  Our  results  revealed  that  this  semantic  knowledge  led  to  improved 
classification  performance  for  inter pretable,  temporally-structured  patterns. 
However,  it  was  surprising  to  find  that  the  same  semantic  knowledge  impaired 
performance  when  dealing  with  tempor ally-unstructured  patterns.  It  was 
conjectured  that  in  classifying  ambiguous  or  unstructured  sounds,  dependence 
on  prior  knowledge  of  natural  pattern  structure  led  to  a  greater  number  of 
false  classifications.  Knowledge  of  temporal  structure  and  knowledge  of  the 
source  events  interacted  in  an  important  way  to  influence  classification 
performance.  Individuals  relied  on  their  knowledge  of  the  sounds  that  were 
likely  to  occur  was  well  as  on  the  specific  perceptual  features  in  the 
acoustic  signals. 

Since  the  effects  of  pattern  meaningfulness  can  depend  on  the  listener's 
ability  to  make  sense  out  of  individual  pattern  components,  an  additional 
three-phase  experiment  was  conducted  to  assess  listeners'  ability  to  recognize 
and  to  identify  isolated  environmental  sounds  (Pubs.  7.  11,  13).  The  first 


phase  involved  free  identification  of  ten  short-duration  recordings  of 
real-world  events.  The  second  phase  required  free  identification  of  five 
multi-element  sequences  composed  of  a  subset  of  the  ten  acoustic  transients. 
These  sequences  were  meaningful  and  represented  sounds  produced  by  opening 
and/or  closing  water  or  steam  valves.  The  third  phase  involved  a 
forced-choice  identification  of  the  ten  transients  using  a  checklist  of 
descriptors.  The  results  showed  that  whereas  some  types  of  sounds  were 
identified  easily  by  most  listeners,  others  were  confused  and  rarely 
identified  correctly.  For  example,  several  metallic  sounds  were  often 
confused  semantically  even  though  they  were  quite  distinct  perceptually.  The 
identification  of  patterns  depended  on  both  the  salience  of  the  individual 
sounds  in  the  pattern  and  the  semantic  relationships  among  the  sounds. 

To  summarize,  we  have  demonstrated  that  many  complex  sound  patterns  have 
both  syntactic  (temporal)  and  semantic  (contextual)  structure  which  is 
determined  by  the  sequence  of  source  events  which  produced  them.  In 
interpreting  such  patterns,  human  listeners  relied  on  their  knowledge  of  these 
factors  as  well  as  on  the  perceptual  cues  available  in  the  sound  itself. 
Although  most  theorists  agree  that  this  occurs  in  the  perception  of  speech, 
the  role  of  these  factors  in  the  classification  of  nonlinguistic  acoustic 
patterns  had  not  been  demonstrated  previously.  Our  findings  have  shown  that 
these  factors  can  play  an  important  role  in  even  relatively  simple 


classification  tasks. 
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Target  Observation 


The  pattern  classification  experiments  described  in  the  preceding  section 

demonstrated  that  listeners  use  their  knowledge  of  temporal  structure  to 

classify  patterns  of  transient  events.  This  raised  the  question  of  how  the 

underlying  structure  was  initially  acquired  or  learned  by  listeners.  Three 

additional  experiments  investigated  how  individuals  learn  to  classify 

sequentially-structured  patterns  of  complex  environmental  sounds  (Pubs.  3. 

10).  In  the  first  of  these  experiments  listeners  classified  either  auditory 

patterns  or  their  visually-presented  symbolic  analogs  as  targets  or 

« 

nontargets.  .  Some  individuals  received  "observation"  trials  on  which  they 
simply  heard  (saw)  examples  of  the  target  patterns  prior  to  classification. 
The  observation  trials  were  shown  to  be  effective  for  target  acquisition,  and 
positive  transfer  occurred  between  symbolic  observation  and  subsequent 
auditory  pattern  classification.  The  results  of  the  two  follow-up  experiments 
suggested  further  that  listeners  acquired  a  global  attentional  strategy  during 
the  observations  and  that  the  generalizability  of  observational  learning  was 
limited  when  structurally  ambiguous  patterns  were  used.  Post-experimental 
tests  suggested  that  individuals  implicitly  learned  something  about  the 
composition  rules  used  to  produce  the  target  patterns  rather  than  simple 
paired-associate  responses.  These  findings  provided  some  insights  into  the 
abstraction  processes  which  enabled  listeners  to  construct  an  internal 


representation  of  the  temporal  or  sequential  structure  of  nonlinguistic 


transient  patterns. 


Attentional  Focusin 


The  preceding  experiments  established  that  top-down  processes  play  an 
important  role  in  the  perceptual  processing  of  acoustic  transients.  This  has 
two  important  implications.  First,  top-down  processing  may  lead  to  improved 
performance  when  it  helps  the  individual  focus  attention  on  a  particular 
pattern  component.  Such  attentional  focusing  can  lead  to  an  enhanced  ability 
to  resolve  individual  elements  from  multi-component  acoustic  patterns. 
Second,  top-down  processing  may  impair  performance  whenever  it  leads  the 
listener  to  "hear"  pattern  elements  which  are  not  physically  present  in  the 
acoustic  signature.  Although  such  auditory  induction  is  helpful  in 
interpreting  highly-redundant  signals  such  as  speech,  misinterpretation  can 
occur  when  listeners  must  detect  individual  components  in  noisy  underwater 
sound  patterns. 

An  important  issue  addressed  in  the  attentional  focusing  experiments  was 
the  ability  of  listeners  to  use  the  earlier  pattern  components  as  cues  to 
listen  selectively  for  later  elements.  We  carried  out  a  series  of  experiments 
in  which  listeners  were  presented  with  twelve-element  tonal  patterns  at  a  low 
signal-to-noise  ratio  (Pubs.  5,  14,  15).  Two  presentations  of  a  pattern 
occurred  on  each  of  a  series  of  trials.  One  presentation  of  the  pattern  was 
complete  whereas  the  other  was  missing  the  eleventh  (primary)  tone.  The 
listeners  judged  which  of  two  consecutive  presentations  of  the  pattern  had  the 
eleventh  element  missing  and  which  was  complete.  On  202  of  the  test  trials, 
the  eleventh  component  of  the  complete  pattern  was  replaced  with  one  of  four 
"probe"  tones. 


Our  findings  indicated  that  listeners  were  more  sensitive  to  the  primary 
tone  than  to  the  probe  tones.*  This  suggested  that  as  listeners  acquired 
experience  with  the  patterns,  they  used  the  earlier  components  as  cues  to 
predict  the  later  elements.  This  resulted  in  a  selective  increase  in 
sensitivity  to  the  expected  component  with  a  contrasting  inability  to  hear 
unexpected  elements  (i.e.,  the  low-probability  probes) .  Furthermore,  this 
sensitivity  changed  moment-to-moment  as  a  function  of  the  attentional  cues 
provided  by  earlier  pattern  components. 

These  results  suggested  that  the  early  components  serve  two  distinct  cue 
functions:  (1)  an  "informational”  function  that  provides  information 
regarding  which  primary  tone  was  likely  to  occur  on  a  given  trial,  and  (2)  a 
"frequency"  function  that  automatically  directs  listening  to  an  appropriate 
frequency  range  and  narrows  or  "fine  tunes"  the  listening  band.  The 
informational  function  was  demonstrated  since  "off-pattern”  primaries  (i.e., 
high-probability  test  signals  at  a  frequency  which  differed  from  that  of  the 
early  components)  as  well  as  "on-pattern"  primaries  (i.e.,  high-probability 
test  signals  at  a  frequency  consistent  with  that  of  the  early  components)  were 
detected  more  reliably  than  the  low-probability  probes.  The  frequency 
function  of  the  pattern  cues  was  demonstrated  since  a  greater  sensitivity 
advantage  was  observed  for  the  on-pattern  primaries  than  for  the  off-pattern 
primaries.  Specifically,  when  the  early  pattern  components  provided  a 
consistent  informational  cue  but  inappropriate  frequency  cue  for  the  test 
signal,  degraded  listening  performance  was  obtained. 
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Induction 


In  another  series  of  experiments  we  examined  the  second  implication  of 
top-down  processing  described  in  the  preceding  section — auditory  induction. 
Auditory  induction  occurs  when  sufficient  contextual  cues  exist  for  listeners 
to  perceive  or  "induce"  implied  signals  which  are  not  physically  present 
(Pubs.  6,  16).  In  an  auditory  detection  context  induction  leads  to  impaired 
performance  when  listeners  report  hearing  targets  which  are  not  actually 
present . 


In  three  experiments,  listeners  were  asked  to  detect  signals  embedded  in 
noise  bursts  which  were  preceded  and  followed  by  an  acoustic  context  designed 
to  be  either  consistent  or  inconsistent  with  the  to-be-detected  signal.  The 
signals  were  200  ms  pure  tones  of  constant,  rising,  or  falling  frequency.  The 
contextual  flanking  tones  which  preceded  and  followed  the  noise  burst  were 
either  continuous  or  discontinuous  in  frequency  with  the  to-be-detected  target 
tone.  For  example,  a  signal  of  rising  frequency  would  be  "in-context"  with 
low  frequency  preceding  and  high  frequency  following  flanker  tones.  On  the 
other  hand,  the  same  rising  frequency  signal  would  be  "out-of-context"  with 
high  frequency  preceding  and  low  frequency  following  flankers. 


The  results  of  the  first  two  experiments  showed  that  detection  of 
in-context  signals  was  associated  with  an  increased  false-alarm  rate  and 
lowered  sensitivity  relative  to  out-of-context  signals.  In  the  third 
experiment  the  signal  and  the  contextual  flanking  tones  were  presented  to 
different  ears  to  eliminate  the  possibility  that  peripheral  masking 


contributed  to  this  result.  The  same  pattern  of  selective  contextual 
impairment  of  detection  performance  was  obtained. 

Overall,  the  results  indicated  that  auditory  induction  occurs  with 
discrete  target  signals.  In  other  words,  the  contextual  tones  were  perceived 
as  continuing  through  the  noise  bursts  and  this  illusion  degraded  detection 
performance.  However,  the  effect  occurred  only  when  the  target  signal  was 
consistent  with  its  acoustic  context.  Sufficient  top-down  processing  occurred 
for  listeners  to  "hear"  the  in-context  signals  even  when  they  were  not 
actually  present. 

Performance  Aids 

A  question  of  practical  significance  concerns  the  design  of  interactive 
operator  performance  aids  based  on  our  understanding  of  how  people  process 
acoustic  transients.  The  most  recent  experiment  carried  out  on  this  project 
investigated  this  issue  (Pubs.  8,  17).  Specifically,  the  experiment  explored 
how  the  interface  design  of  a  computerized  database  system  influenced  the  use 
of  that  system  by  novice  (non-technical)  users.  Alternative  displays  or 
performance  aids  were  constructed  to  provide  different  "conceptual  models"  of 
the  system.  Training  with  a  conceptual  model  affected  the  user's  "mental 
model"  of  the  system,  that  is,  their  knowledge  or  understanding  of  it.  This, 
in  turn,  led  to  substantial  differences  in  performance  when  the  display  was 
used  as  an  aid  in  classification. 

Listeners  learned  to  search  a  two-dimensional  database  (sounds  varying  in 


pitch  and  loudness)  with  one  of  two '  conceptual  models  of  the  system 
(analogical  or  abstract).'  These  models  involved  different  graphical 
representations  of  the  database  which  were  presented  on  a  video  display. 
After  training,  listeners  were  tested  on  both  the  two-dimensional  and  an 
extended  three-dimensional  database  (sounds  varying  in  pitch,  loudness,  and 
duration),  using  a  neutral,  verbal  display  with  no  graphical  component. 


The  findings  indicated  that  the  performance  aids  provided  the  users  with 
an  initial  mental  representation  of  the  database  system.  Users  with  the 
analog  graphical  aid  performed  better  on  the  two-dimensional  database,  whereas 
users  with  the  abstract  graphical  aid  performed  better  with  three-dimensional 
database.  These  results  indicate  that  incorporating  conceptual  models  as 
performance  aids  in  interface  design  influences  novices'  performance  with  a 
simple  perceptual  database  system. 


Conclusions  and  Implications 


Theoretical  Significance.  It  is  clear  from  this  research  that  human 
listeners  depend  heavily  on  their  knowledge  of  the  acoustic  environment  in 
both  classifying  and  detecting  complex  nonlinguistic  signals.  These  knowledge 
sources  include  syntactic  knowledge  of  the  temporal  structure  or  orderings  of 
sounds  as  well  as  semantic  knowledge  about  the  source  events  likely  to  have 
produced  them.  As  demonstrated  on  this  project,  the  implications  of  this 
top-down  processing  can  be  significant  for  the  classification  of  acoustic 
transient  patterns  or  isolated  meaningful  sounds  as  well  as  for  the  detection 
of  contextually-embedded  targets.  Without  considering  the  external  knowledge 


which  listeners  bring  into  an  auditory  perception  task,  the  processes  involved 
in  interpreting  complex  meaningful  environmental  sounds  cannot  be  understood. 

An  important  implication  of  this  conclusion  is  that  traditional, 
bottom-up-oriented  approaches  to  psychoacoustics  will  not  lead  to  an 
understanding  of  how  contextually-embedded  or  meaningful  nonlinguistic  signals 
are  perceived.  As  in  the  case  of  speech,  the  information  present  in  the 
signal  itself  is  insufficient  to  account  for  what  is  heard.  For  example,  in 
our  attentional  focusing  experiments,  the  signal  context  was  more  important  in 
determining  signal  audibility  than  was  the  signal  intensity,  and  in  our 
auditory  induction  studies,  listeners  heard  contextually-plausible  signals 
which  were  not  actually  present.  The  most  productive  strategy  for 
investigating  the  perception  of  these  sounds  is  (1)  to  identify  the 
information  or  knowledge-sources  actually  used  by  listeners  in  detection  or 
classification,  and  (2)  to  determine  the  relative  importance  of  both  top-down 
and  bottom-up  processes  for  various  task  and  signal  conditions.  The  studies 
carried  out  on  this  project  have  demonstrated  the  effectiveness  of  this 
approach. 

Naval  Relevance.  We  live  in  an  age  of  technology — an  age  of 
ever-increasing  automation.  A  natural  question  to  pose  in  this  context  is 
"why  study  basic  human  perceptual  capabilities  when  the  human  is  likely  to  be 
automated  out  of  the  system?"  After  all,  intelligent,  knowledge-based 
computers  are  reliable,  adaptable  systems  which  are  especially  well-suited  to 
tedious  surveillance  and  monitoring  tasks.  In  our  view,  the  answer  to  this 
pragmatic  question  lies  in  recognizing  the  value  of  basic  human  research  to 


the  automation  development  effort.  Few  current  pattern  recognition  systems 
are  fully  automated.  Rather,  noit  incorporate  some  combination  of  human 
operator  and  expert  system  in  an  interactive  environment.  An  optimized  system 
will  achieve  a  division  of  labor  which  is  best  suited  to  the  capabilities  of 
both  the  human  and  the  computer  system. 

Basic  perceptual  research  can  contribute  to  this  in  two  ways.  First, 
once  the  processes  which  underlie  the  human's  listening  capability  are 
understood,  preprocessing  performance  aids  can  be  developed  to  facilitate  the 
operator's  task.  Second,  in  most  cases  the  very  design  of  expert  systems 
depends  in  some  way  on  understanding  how  a  human  performs  the  task.  For 
example,  H.  Penny  Nil  and  Edward  Feigenbaum  of  Stanford  University  have 
developed  an  intelligent,  knowledge-based  expert  system  called  HASP  (SU/X  in 
its  earlier  versions),  for  use  in  machine-aided  interpretation  of  data  from  an 
extensive  underwater  surveillance  system  (Nil  A  Feigenbaum,  1978;  1982).  The 
system  is  based  on  the  HEARSAY-II  speech  understanding  system  (Erman, 
Hayes-Roth,  Lesser,  &  Reddy,  1980),  and  is  designed  to  identify  and  track 
multiple-ship  targets  by  monitoring  spectral  data  over  time— much  as  a  sonar 
analyst  monitors  a  sonogram  display.  Nil  and  Feigenbaum  have  argued 
convincingly  that  traditional  statistical  signal  processing  methods  are 
Ineffective  in  this  complex  environment;  what  is  critical  is  that  the  system 
have  a  knowledge  base  or  level-of-expertise  similar  to  that  of  the  experienced 
human  technician.  Given  this  expertise,  the  HASP  system  can  use  some  of  the 
rules  or  heuristics  employed  by  experienced  analysts  in  data  interpretation. 
As  in  other  expert  systems,  the  knowledge  base  for  this  system  was  developed 
through  an  extensive  analysis  of  the  procedures  followed  by  expert  human 


technicians.  HASP  has  achieved  impressive  success  with  large  amounts  of  real 
data  at  poor  signal-to-noise  ratios.  In  its  current  implementation,  the  raw 
data  input  to  the  system-— equivalent  to  sonogram  lines  and  line 
parameters — are  encoded  by  human  operators  for  input  to  HASP.  A  somewhat  more 
complex,  but  similar  expert  system  for  acoustic  signal  processing  has  been 
developed  by  Maksym  and  his  colleagues  (Maksym,  Bonner,  Dent,  &  Hemphill, 

9 

1983) •  Basic  research  such  as  that  described  in  this  report  will  be  of  use  in 
developing  fully-  or  partially-  automated  systems  for  sonar  analysis. 

Research  Apprentice  Program.  We  participated  in  the  Minority  High 
Schooler  Research  Apprentice  Program  throughout  the  four-years  of  this 
project.  Four  individuals,  all  students  at  Archbishop  Carroll  High  School 
adjacent  to  the  Catholic  University  campus,  were  employed  on  the  project.  The 
students  assisted  in  a  variety  of  laboratory  tasks,  but  focused  primarily  on 
computer  programming  projects  since  they  all  had  interests  in  this  area.  The 
students  were  selected  on  the  basis  of  interviews  after  a  preliminary 
screening  by  the  school's  guidance  director.  The  major  criteria  used  for 
selection  included  a  strong  interest  in  science,  good  academic  achievement, 
and  a  high  level  of  motivation.  The  students  adapted  well  to  the  laboratory 
environment  and  they  got  along  well  with  our  undergraduate  and  graduate 
students  and  with  other  staff.  All  four  students  continued  their  education 
with  an  undergraduate  concentration  in  computer  science  at  major  universities. 
Vincent  Harrison  and  Joseph  Jennifer  are  both  Juniors  at  the  University  of 
Virginia ,  Earl  Mitchell  is  a  Freshman  at  The  Massachusetts  Institute  of 
Technology,  and  Patrick  Outlaw  is  a  Freshman  at  Howard  University. 
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Directions  for  Futyire  Research 

A  number  of  areas  for  future  research  were  indicated  by  our  findings. 
First,  several  direct  extensions  of  the  studies  reported  here  would  be  of 
value.  For  example,  our  attentional  focusing  experiments  revealed  that  both 
informational  and  frequency  cues  were  significant  in  selective  listening, 
however,  the  relative  importance  of  these  two  types  of  cues  was  not 
determined.  We  have  pilot  data  which  indicate  that  informational  cues  are 
ineffective  without  supplementary  information.  Similarly,  the  conditions 
which  determine  the  strength  of  induced,  "phantom"  signals  should  be  explored. 

Second,  an  issue  of  primary  importance  concerns  the  ability  of  listeners 
to  segment  complex  acoustic  signatures  into  separate,  interpretable  sound 
sources.  Bregman  has  referred  to  this  as  auditory  stream  segregation  or 
streaming  (Bregman,  1978).  Many  recent  studies  have  looked  at  auditory 
streaming  in  simple  rhythmic-  or  in  complex  musical-patterns,  but  the 
segmentation  of  complex  environmental  sounds  has  not  been  addressed.  The 
experiments  carried  out  on  this  project  provide  a  good  foundation  for 
investigating  auditory  stream  segregation  for  complex  patterns  of  this  type. 


Third,  although  the  present  project  addressed  issues  primarily  of 
relevance  to  passive  sonar,  a  range  of  similar  Issues  should  be  explored  for 
active  sonar.  In  active  sonar,  a  brief  pulse  is  reflected  off  of  an  object  to 
determine  its  properties  by  analyzing  the  reflected  signature.  Pilot  studies 
in  our  laboratory  indicated  that  under  some  conditions,  human  listeners  can 
learn  a  great  deal  about  unknown  objects  by  listening  to  pulsed  signatures  of 
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this  sort.  A  more  thorough  understanding  of  when  and  how  this  occurs  would  be 
of  both  theoretical  and  practical  value. 


The  author  thanks  all  of  the  individuals  listed  in  Appendix  B  for  their 
important  contribution  to  this  research  with  special  thanks  to  Jim  Balias, 
Kevin  Bennett,  and  Janet  HcLeod  for  their  long  association  with  the  project 
and  their  significant  contribution.  I  am  especially  grateful  to  John  O’Hare, 
the  scientific  monitor  for  the  project,  for  his  encouragement,  sharp  and 
relevant  criticism,  and  continued  interest  in  the  research. 
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